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Abstract 

Statistical pattern classification methods based on data-random graphs were introduced recently. In 
this approach, a random directed graph is constructed from the data using the relative positions of the 
data points from various classes. Different random graphs result from different definitions of the prox- 
imity region associated with each data point and different graph statistics can be employed for data 
reduction. The approach used in this article is based on a parameterized family of proximity maps 
determining an associated family of data-random digraphs. The relative arc density of the digraph is 
used as the summary statistic, providing an alternative to the domination number employed previously. 
An important advantage of the relative arc density is that, properly re-scaled, it is a [/-statistic, facili- 
tating analytic study of its asymptotic distribution using standard [/-statistic central limit theory. The 
approach is illustrated with an application to the testing of spatial patterns of segregation and associa- 
tion. Knowledge of the asymptotic distribution allows evaluation of the Pitman and Hodges-Lehmann 
asymptotic efficacies, and selection of the proximity map parameter to optimize efficiency. Furthermore 
the approach presented here also has the advantage of validity for data in any dimension. 



1 Introduction 



Classification and clustering have received considerable attention in the statistical literature. In recent years, 
a new classification approach has been developed which is based on the relative positions of the data points 
from various classes. Priebe ct al. introduced the class cover catch digrap hs (CCCD) in R and gave the exact 
an d the asymptotic distrib ution of the domination number of the CCCD IPriebe et al.l (|200lh . DeVinney et 
al. DeVinney et al. ( 2002), M archette and Priebe Marchette and Priebe! ( 20031 ). Priebe et al. Priebe et al 



(|2003bl ) , IPriebe et alj (|2003al ) applied the concept in higher dimensions and demonstrated relatively good 
performance of CCCD in classification. The methods employed involve data reduction (condensing) by using 
approximate minimum dominating sets as prototype sets (since finding the exact minimum dominating set 
is an NP-hard problem — in particular for CCCD). Furthermore the exact and the asymptotic distribution 
of the domination number of the CCCD are not analytically tractable in multiple dimensions. 

Ceyhan and Priebe introdu ced the central similarity pr oxim ity map and r-factor proxim ity maps and the 
associated random digraphs in Ceyhan and Priebel ( 2003a ) and Ceyhan and Priebel ( 2003b ). respectively. In 
both cases, the space is partitioned by the Delaunay tessellation which is the Delaunay triangulation in K 2 . 
In each triangle, a family of data-random proximity catch digraphs is constructed based on the proximity of 
the points to each other. The advantages of the r-factor proximity catch digraphs are that an exact minimum 
dominating set can be found in polynomial time and the asymptotic distribution of the domination number 
is analyti cally tractable. The latter is then used to test segregation and association of points of different 
classes in lCevhan and Prie bel (l2003bl) . Segregation and assocation are two patterns that describe the spatial 
relation between two or more classes. See Section [231 for more detail. 



In this article, we employ a different statistic, namely the relative (arc) density, that is the proportion 
of all possible arcs (directed edges) which are present in the data random digraph. This test statistic has 
the advantage that, properly rescalcd, it is a [/-statistic. Two plain classes of alternative hypotheses, for 
segregation and association, arc defined in Section [2. 51 The asymptotic distributions under both the null and 
the alternative hypotheses are determined in Section [3] by using standard U -statistic central limit theory 
Pitman and Hodgcs-Lchman asymptotic efficacies are analyzed in Sections 14.31 and 14.41 respectively. This 
test is related to the available tests of segregation and association in the ecology literature, such as Pielou's 
test and Ripley's test. See discussion in Section for more detail. Our approach is valid for data in any 
dimension, but for simplicity of expression and visualization, will be described for two-dimensional data. 



2 Preliminaries 



2.1 Proximity Maps 



Let (ft, Ai) be a measurable space and consider a function N : ft x 2 n — > 2 n , where 2 represents the power 
set of ft. Then given y C ft, the proximity map Ny(-) = N(-,y) : ft — ► p(ft) associates with each point 
x € ft a proximity region Ny(x) C ft. Typically, N is chosen to satisfy x G Ny(x) for all x G ft. The use of 
the adjective proximity comes form thinking o f the region Ny(x ) as representing a neighborhood of points 
"close" to x. (jJaromczvk and Toussaintl (l992l ); iToussamtl (|l980t U 



2.2 r-Factor Proximity Maps 



We now briefly define r-factor proximity maps. (See Ceyhan and Priebe Cevhan and Priebel ( 2003bl ) for 



more details). Let ft = R 2 and let y = {yi,V2,y3} C K 2 be three non-collinear points. Denote by T(y) 
the triangle — including the interior — formed by the three points (i.e. T(y) is the convex hull of y). For 
r G [l,oo], define Ny to be the r-factor proximity map as follows; see also Figure [TJ Using line segments 
from the center of mass (centroid) of T{y) to the midpoints of its edges, we partition T(y) into "vertex 
regions" i?(yi), R(y%), and R{ys). For x G T(y) \ y, let v(x) G y be the vertex in whose region x falls, 
so x G R(v(x)). If x falls on the boundary of two vertex regions, we assign v(x) arbitrarily to one of the 
adjacent regions. Let e{x) be the edge of T{y) opposite v(x). Let £(x) be the line parallel to e(x) through 
x. Let d(v(x),£(x)) be the Euclidean (perpendicular) distance from v(x) to i(x). For r G [l,oo), let £ r (x) 
be the line parallel to e(x) such that d(v(x),£ r (x)) = rd(v(x) , £(x)) and d(£(x) , £ r (x)) < d(v(x),£ r (x)). Let 
T r (x) be the triangle similar to and with the same orientation as T(y) having v(x) as a vertex and £ r {x) as 
the opposite edge. Then the r-factor proximity region Ny(x) is defined to be T r (i)nr(^). Notice that r > 1 
implies x G N y (x). Note also that limr-oo N y (x) = T(y) for all x G T(y) \ y, so we define Ny°(x) = T(y) 
for all such x. For x G y, we define Ny(x) = {x} for all r G [1, oo]. 



2.3 Data-Random Proximity Catch Digraphs 

If X n := {Xi, X2, • • • , X n } is a set of fi-valued random variables, then the Ny(Xi), i = 1, • • • , n, are random 
sets. If the Xi are independent and identically distributed, then so are the random sets Ny(Xi). 

In the case of an r-factor proximity map, notice that if Xi ~ F and F has a non-degenerate two- 
dimensional probability density function / with support(/) C T(y), then the special case in the construction 
of Ny — X falls on the boundary of two vertex regions — occurs with probability zero. 

The proximities of the data points to each other are used to construct a digraph. A digraph is a 
directed graph; i.e. a graph with directed edges from one vertex to another based on a binary relation. 
Define the data-random proximity catch digraph D with vertex set V = {X\, ■ ■ ■ ,X n } and arc set A by 
(Xi, Xj) G A <J=^ Xj G Ny(Xi). Since this relationship is not symmetric, a digraph is needed rather than 



Figure 1: Construction of r-factor proximity region, Ny(x) (shaded region). 



a graph. The random digraph D depends on the (joint) distribution of the Xi and on the map Ny. 



2.4 Relative Density 

The relative arc density of a digraph D = (V,A) of order |V| = n, denoted p{D), is defined as 



n(n — 1) 

where | • | denotes the set cardinality functional Janson et al. ( 2000l ). 

Thus p(D) represents the ratio of the number of arcs in the digraph D to the number of arcs in the 
complete symmetric digraph of order n, which is n(n — 1). For brevity of notation we use relative density 
rather than relative arc density henceforth. 

If X\, • • • , X n ~ F the relative density of the associated data-random proximity catch digraph D, denoted 
p(X n ; h, Ny), is a [/-statistic, 

p(X n ;h, Ny) = \ ]T Y, h (Xi,Xj;Ny) (1) 

where 



hiX^Xj-Ny) = l{{X. ll X J )^A}+l{{X ] ,X i )^A} 

= liXj&NyiXift + IiXi&NyiX;)}, (2) 

where I(-) is the indicator function. We denote h(Xi, Xj; Ny) as hij for brevity of notation. Although the 
digraph is asymmetric, hij is defined as the nu mber of arcs in D between vertices Xi and Xj, in order to 
produce a symmetric kernel with finite variance iLehmannl (|l988l) . 

The random variable p n := p(X n ;h,Ny) depends on n and Ny explicitly and on F implicitly. The 
expectation E [p n ], however, is independent of n and depends on only F and Ny: 



< E [p n ] = -E [h 12 ] < 1 for all n > 2. (3) 



The variance Var [p n ] simplifies to 



< Var [ Pn ] = Var [h 12 ] + ^yCov [h 12 , h 13 ] < 1/4. (4) 



A central limit theorem for U -statistics iLehmannl (|1988l ) yields 



V^(p„ - E [p n ]) -±+ 7V(0, Cov [h 12 ,h 13 ]) (5) 

provided Cov \h\2, /113] > 0. The asymptotic variance of p n , Cov [/112, A13], depends on only f and iVy. 
Thus, we need determine only E [fo 12 ] and Cov [/i 12 , ^13] in order to obtain the normal approximation 

approx.,/ . , r n r /E[/ii 2 ] Cov [/i 12 , /ii 3 ] \ 
p„ - A/ (E [p n \,Var \p n \) = N — - — , for large n. (6) 



2.5 Null and Alternative Hypotheses 

In a two class setting, the phenomenon known as segregation occurs when members of one class have a 
tendency to repel members of the other class. For instance, it may be the case that one type of plant does 
not grow well in the vicinity of another type of plant, and vice versa. This implies, in our notation, that Xi 
are unlikely to be located near any elements of y. Alternatively, association occurs when members of one 
class have a tendency to attract members of the other class, as in s ymbiot i c spec i es, so that the Xj wil l tend 



class have a tendency to attract members ot the other class, as in symbiotic speci es, so that the A,; wil l tend 
to cluster around the elements of y, for example. See, for instance. iDixon (1994), Coomes et al.l ( 19991 ). The 



null hypoth esis for spatial patterns ha ve been a contraversial topic in ecology from the early days. Gotclli 
and Graves Gotelli and Graved dl996h have collected a voluminous literature to present a comprehensive 



analysis of the use and misuse of null models in ecology community. They also define and attempt to clarify 
the null model concept as "a pattern-generating model that is based on randomization of ecological data or 
random sampling from a known or imagined distribution. . . . The randomization is designed to produce 
a pattern that would be expected in the absence of a particular ecological mechanism." In other words, the 
hypothesized null models can be viewed as "thought experiments," which is conventially used in the physical 
sciences, and these models provide a statistical baseline for the analysis of the patterns. For statistical testing 
for segregation and association, the null hypothesis we consider is a type of complete spatial randomness; 
that is, 

H :Xi^U{T(y)) 

where U(T(y)) is the uniform distribution on T(y). If it is desired to have the sample size be a random 
variable, we may consider a spatial Poisson point process on T(y) as our null hypothesis. 

We define two classes of alternatives, Hf and Hf with e £ (0, v3/3), for segregation and association, 
respectively. For ye}', let e(y) denote the edge of T(y) opposite vertex y, and for x £ T(y) let £ y (x) denote 
the line parallel to e(y) through x. Then define T(y, e) = {x £ T(y) : d(y, i y {x)) < e}. Let Hf be the model 

under which X t l ~U(T(y) \ U yey T(y, e)) and Hf be the model under which X t ~ U(u yey T(y, y/3/3 - e)) . 
Thus the segregation model excludes the possibility of any Xi occurring near a y^ , and the association model 
requires that all Xi occur near a y^. The s/3/3 — e in the definition of the association alternative is so that 
e = yields Ho under both classes of alternatives. 

Remark: These definitions of the alternatives are given for the standard equilateral triangle. The 
geometry invariance result of Theorem 1 from Section 3 still holds under the alternatives, in the following 
sense. If, in an arbitrary triangle, a small percentage <5 • 100% where <5 £ (0, 4/9) of the area is carved away as 
forbidden from each vertex using line segments parallel to the opposite edge, then under the transformation to 
the standard equilateral triangle this will result in the alternative H s . This argument is for segregation 



with 8 < 1/4; a similar construction is available for the other cases. 



3 Asymptotic Normality Under the Null and Alternative Hy- 
potheses 



First we present a "geometry invariance" result which allows us to assume T(y) is the standard equilateral 
triangle, T((0, 0), (1, 0), (1/2, V3/2)), thereby simplifying our subsequent analysis. 

Theorem 1: Let y — {yi,y 2 ,y3} C K 2 be three non-collinear points. For i = 1, • • • , n let Xi ~ 
F = U(T(y)), the uniform distribution on the triangle T(y). Then for any r <E [l,oo] the distribution of 
p(X n ; h,Ny) is independent of y, hence the geometry of T(y). 

Proof: A composition of translation, rotation, reflections, and scaling will transform any given trian- 
gle T = r(y 1) y 2 ,y 3 ) into the "basic" triangle T b = T((0, 0), (1, 0), (ci, c 2 )) with < c x < 1/2, c 2 > 
and (1 — ci) 2 + c 2 < 1, preserving uniformity. The transformation <\> e : M 2 — > M 2 given by cf> e (u,v) = 

(u + l= jf L v,^v\ takes T b to the equilateral triangle T e = T((0, 0), (1, 0), (1/2,73/2)). Investigation of 
the Jacobian shows that 4> e also preserves uniformity. Furthermore, the composition of 4> e with the rigid 
motion transformations maps the boundary of the original triangle T a to the boundary of the equilateral 
triangle T e , the median lines of T a to the median lines of T e , and lines parallel to the edges of T a to lines 
parallel to the edges of T e . Since the joint distribution of any collection of the hij involves only probability 
content of unions and intersections of regions bounded by precisely such lines, and the probability content 
of such regions is preserved since uniformity is preserved, the desired result follows. ■ 

Based on Theorem 1 and our uniform null hypothesis, we may assume that T(y) is the standard equi- 
lateral triangle with y = {(0, 0), (1, 0), (1/2, \/3/2)} henceforth. 

For our r- factor proximity map and uniform null hypothesis, the asymptotic null distribution of p n (r) = 
p(X n ; h, Ny) can be derived as a function of r. Let p(r) := E [/9 ra (r)] and v(r) := Cov [hi2, his]. Notice that 
p(r) = E [/ii 2 ]/2 = P(X 2 £ Ny(Xi)) is the probability of an arc occurring between any pair of vertices. 



3.1 Asymptotic Normality under the Null Hypothesis 

By detailed geometric probability calculations, provided in Appendix 1, the mean and the asymptotic vari- 
ance of the relative density of the r-factor proximity catch digraph can explicitly be computed. The central 
limit theorem for [/-statistics then establishes the asymptotic normality under the uniform null hypothesis. 
These results are summarized in the following theorem. 

Theorem 2: For r £ [l,oo), 

( 7) 

where 

f|^r 2 for r£ [1,3/2), 

M (r) = <^ -±r 2 + 4-8^ + %r~ 2 for re [3/2,2), (8) 

[l - §r~ 2 for r £ [2, oo), 

and 



u(r) = Mr) I(r £ [1, 4/3)) + u 2 (r) I(r £ [4/3, 3/2)) + u z {r) I(r £ [3/2, 2)) + i/ 4 (r) I(r £ [2, oo]) (9) 



itli 



, , 3007 r 10 - 13824 r 9 + 898 r 8 + 77760 r 7 - 1 17953 r 6 + 48888 r 5 - 24246 r 4 + 60480 r 3 - 38880 r 2 + 3888 
v ; 58320 r 4 

_ 5467 r 10 - 37800 r 9 + 61912 r 8 + 46588 r 6 - 191520 r 5 + 13608 r 4 + 241920 r 3 - 155520 r 2 + 15552 
~~ 233280 r 4 ' 

^(r) = -[7r 12 - 72r" +312r 10 - 5332 r 8 + 15072 r + 13704r 6 - 139264 r 5 + 273600 r 4 - 242176r 3 

+ 103232 r 2 - 27648 r + 8640] / [960 r 6 ] , 



1/4 (r) 



15 r 4 - 11 r 2 - 48 r + 25 



15 r 6 

For r = oo, p n {r) is degenerate. 
See Appendix 1 for the proof. 

Consider the form of the mean and variance functions, which are depicted in Figure [H Note that fx{r) 
is monotonically increasing in r, since the proximity region of any data point increases with r. In addition, 
fj.(r) — > 1 as r — ► oo, since the digraph becomes complete asymptotically, which explains why p n (r) is 
degenerate, i.e. i/(r) = 0, when r = oo. Note also that is continuous, with the value at r = 1 

M(l) = 37/216. 

Regarding the asymptotic variance, note that v(r) is continuous in r with lim,.^^ z/(r) = and z/(l) = 
34/58320 w .000583 and observe that sup r>1 i/(r) ~ .1305 at argsup r>1 v(r) « 2.045. 





Figure 2: Asymptotic null mean fi(r) (left) and variance i/(r) (right), from Equations (jSJ and (J9j) in Theorem 
2, respectively. The vertical lines indicate the endpoints of the intervals in the piecewise definition of the 
functions. Notice that the vertical axes are differently scaled. 



To illustrate the limiting distribution, r = 2 yields 
N /^( /9n (2)-/i(2)) 



192ri / . . 5 



or equivalcntly 



Pn(2) 



approx 



5 25 
8' 192n 



AA(0, 1) 



Figure [3] indicates that, for r = 2, the normal approximation is accurate even for small n (although 
kurtosis may be indicated for n = 10). Figure [4] demonstrates, however, that severe skewness obtains for 
small values of n, and extreme values of r. The finite sample variance in Equation |4] and skewness may be 
derived analytically in much the same way as was Cov [/112, /113] for the asymptotic variance. In fact, the 



exact distribution of p n {r) is, in principle, available by successively conditioning on the values of the JQ. 
Alas, while the joint distribution of h\2, h\3 is available, the joint distribution of {/iij}i<i<j< n) and hence the 
calculation for the exact distribution of Pn(r), is extraordinarily tedious and lengthy for even small values of 
n. 




Figure 3: Depicted arc the distributions of p n {2) ap S! ox A/" (§, xfi^;) for n = 10,20,100 (left to right). 
Histograms are based on 1000 Monte Carlo replicates. Solid curves represent the approximating normal 
densities given by Theorem 2. Again, note that the vertical axes are differently scaled. 




Figure 4: Depicted are the histograms for 10,000 Monte Carlo replicates of pio(l) (left) and pio(5) (right) 
indicating severe small sample skewness for extreme values of r. 

Letting H n (r) = 5Zi=i h(Xu X n +i), the exact distribution of p n (f) can be evaluated based on the recur- 
rence 

(n + l)np n+ i(r) = n(n - l)p n (r) + H n (r) 

by noting that the conditional random variable H n (r)\X n +i is the sum of n independent and identically 
distributed random variables. Alas, this calculation is also tedious for large n. 



3.2 Asymptotic Normality Under the Alternatives 

Asymptotic normality of relative density of the proximity catch digraphs under the alternative hypotheses of 
segregation and association can be established by the same method as under the null hypothesis. Let Ef [■] 
( E^[-]) be the expectation with respect to the uniform distribution under the segregation ( association ) 
alternatives with e S (0,V3/3). 



Theorem 3: Let //s(r, e) (and /Lt^(r, e)) be the mean and 1/5(7-, e) (and va(t, e)) be the covariance, 
Cov [/ii2, /113] for r e (0,1] and e G (0, v3/3) under segregation (and association). Then under Hf , 

Vn{Pn{r) — (i,s(r,e)) — ► M(0,us(r,e)) for the values of the pair (r,e) for which 1/5(7-, e) > 0. Likewise, 

under Hf, y/n(p n (r) — pA(r, e)) — * A/"(0, ^(r, e)) for the values of the pair (r, e) for which Ua{t, e) > 0. 

Sketch of Proof: Under the alternatives, i.e. e > , p n (r) is a [/-statistic with the same symmetric 
kernel h%j as in the null case. The mean /Js(r,e) = E e [/9„(r)] = E e [/ii2]/2 (and /^(r, e)), now a function 
of both r and e, is again in [0, 1]. The asymptotic variance 1/5(7", e) = Cov e [/ii2, /113] (and z/A( r > e)), also a 
function of both r and e, is bounded above by 1/4, as before. The explicit forms of ps(r,e) and pa{t,c) is 
given, defined pieccwisc, in Appendix 2. Sample values of ps{r, e), 7/s(r, e) and Pa(t, e), ^A^j e) are given in 
Appendix 3 for segregation with e = and for association with e = y/3/12. Thus asymptotic normality 

obtains provided i>s(r, e) > (^(r, e) > 0); otherwise /9ra(r) is degenerate. Note that under Hf , 

i/ s (r, e) > for (r, e) e [l, V3/(2e)) x (0, V3/4] U [l, V3/e - 2) x (V3/4, V3/3), 

and under Hf, 

u A {r,e) > for (r, e) G (l,oo) x (0, y/3/3) U{1} x (0, V3/12). ■ 

Notice that for the association class of alternatives any r G (l,oo) yields asymptotic normality for all 
e G (0, V3/3), while for the segregation class of alternatives only r = 1 yields this universal asymptotic 
normality. 



4 The Test and Analysis 

The relative density of the proximity catch digraph is a test statistic for the segregation/association alterna- 
tive; rejecting for extreme values of /3 n (r) is appropriate since under segregation we expect p n (r) to be large, 
while under association we expect p n {r) to be small. Using the test statistic 

R = Vn(Pn(r) - p{r)) ^ 

the asymptotic critical value for the one-sided level a test against segregation is given by 

z a = $- 1 (l-a) (11) 

where $(■) is the standard normal distribution function. Against segregation, the test rejects for R > Z\- a 
and against association, the test rejects for R < z a . 



4.1 Consistency 

Theorem 4: The test against Hf which rejects for R > zi_ Q and the test against Hf which rejects for 
R < z a are consistent for r G [1, 00) and e G (0, \/3/3). 

Proof: Since the variance of the asymptotically normal test statistic, under both the null and the 
alternatives, converges to as n — » 00 (or is degenerate), it remains to show that the mean under the null, 
p(r) = E[/9 ra (r)], is less than (greater than) the mean under the alternative, /jg(r, e) = E e [p n (r)] (/m(t~, e)) 
against segregation (association) for e > 0. Whence it will follow that power converges to 1 as n — * 00. 

Detailed analysis of /xs(r, e) and /m(?", e) in Appendix 2 indicates that under segregation ps{r, e) > p(r) 
for all e > and r e [l,oo). Likewise, detailed analysis of /i^(r, e) in Appendix 3 indicates that under 
association /u^(r, e) < p{r) for all e > and r G [1, 00). Hence the desired result follows for both alternatives. 
■ 

In fact, the analysis of p(r, e) under the alternatives reveals more than what is required for consistency. 
Under segregation, the analysis indicates that ps{f, ei) < ps{r,e2) for e\ < £2- Likewise, under association, 
the analysis indicates that Pa{t, ei) > p. a (r, £2) for ei < £2- 



4.2 Monte Carlo Power Analysis 




Figure 5: Two Monte Carlo experiments against the segregation alternative . Depicted are kernel 

V 3/8 

density estimates for p„(ll/10) for n = 10 (left) and n = 100 (right) under the null (solid) and alternative 
(dashed). 

In Figure O we present a Monte Carlo investigation against the segregation alternative for r = 

11/10 and n = 10, 100. With n = 10, the null and alternative probability density functions for pio(l.l) are 
very similar, implying small power (10,000 Monte Carlo replicates yield /3^ c = 0.0787, which is based on the 
empirical critical value). With n = 100, there is more separation between null and alternative probability 
density functions; for this case, 1000 Monte Carlo replicates yield /?^ c = 0.77. Notice also that the probability 
density functions are more skewed for n — 10, while approximate normality holds for n ~ 100. 

For a given alternative and sample size, we may consider analyzing the power of the test — using the 
asymptotic critical value — as a function of the proximity factor r. In Figure [6l we present a Monte Carlo 
investigation of power against and as a function of r for n = 10. The empirical significance 

level is about .05 for r — 2, 3 which have the empirical power /3f (r, \/3/8) « .35, and /3f (r, a/3/4) = 1. So, 
for small sample sizes, moderate values of r are more appropriate for normal approximation, as they yield 
the desired significance level and the more severe the segregation, the higher the power estimate. 

In Figure [7l we present a Monte Carlo investigation against the association alternative H^. 12 for r = 
11/10 and n = 10 and 100. The analysis is same as in the analysis of the Figure[5] In Figure[51 we present a 
Monte Carlo investigation of power against H^gi 12 and as a function of r for n = 10. The empirical 

significance level is about .05 for r = 3/2, 2, 3, 5 which have the empirical power 

Pw( r , V3/12) < .35 with 

maximum power at r = 2, and /3f (r, 5 \/3/24) = 1 at r = 3. So, for small sample sizes, moderate values of 
r are more appropriate for normal approximation, as they yield the desired significance level, and the more 
severe the association, the higher the power estimate. 



4.3 Pitman Asymptotic Efficacy 



Pitman asymptotic efficiency (PAE) provides for an investigation of "local asymptotic power" — local around 
Hq. This involves the limit as n — » oo as well as the limit as e — > 0. A detailed discussion of PAE can be 
found in iKendall and StuartJ (|l979f ) and[Eeden| (jl963f ). For segregation or association alternatives the PAE 

is given by PAE(p„(r)) — — — where k is the minimum order of the derivative with respect to e for 

which ^ (r, e = 0) ^ 0. That is, ^ (r, e = 0) ^ but ^ (r, e = 0) = for I = 1, 2, . . . , k - 1. Then under 



Figure 6: Monte Carlo power using the asymptotic critical value against segregation alternatives 
^V3/8 (l^) an d -^/3/4 ( rl ght) as a function of r, for n = 10. The circles represent the empiri- 
cal significance levels while triangles represent the empirical power values. The r values plotted are 
1, 11/10, 12/10, 4/3, V2, , 2, 3, 5, 10. 




Figure 7: Two Monte Carlo experiments against the association alternative H^. 12 - Depicted are kernel 

density estimates for p„(ll/10) for n = 10 (left) and n = 100 (right) under the null (solid) and alternative 
(dashed). 



Figure 8: Monte Carlo power using the asymptotic critical value against association alternatives 
^/3/i2 (k^) anc ^ ^5^/3/24 ( r ^S nt ) as a function of r, for n = 10. The r values plotted are 
1, 11/10, 12/10, 4/3, \/2, , 2, 3, 5, 10. 



segregation alternative Hf and association alternative Hf, the PAE of p n {r) is given by 
PAE*(r) = ^(r,e = 0)) a and pAE>) = (^(r,e = 0)) a 

respectively, since p' s (r,e = 0) = p! A {r,e = 0) = 0. Equation ([5]) provides the denominator; the numerator 
requires p(r, e) which is provided in Appendix 2 for under both segregation and association alternatives, 
where we only use the intervals of r that donot vanish as e —> 0. 

In Figure [9l we present the PAE as a function of r for both segregation and association. Notice that 
PAE s (r = 1) = 160/7 w 22.8571, linv^^ PAE s (r) = oo, PXE A {r = 1) = 174240/17 w 10249.4118, 
Um r ^ 00 PAE A (r) = 0, argsup re[l oo) PAE A (r) w 1.006 with sup re[1 -oo) PAE A (r) w 10399.7726. PAE A (r) 
has also a local supremum at 77 ~ 1.4356 with PAE A (r;) w 3630.8932. Based on the asymptotic efficiency 
analysis, we suggest, for large n and small e, choosing r large for testing against segregation and choosing r 
small for testing against association. 



4.4 Hodges-Lehmann Asymptotic Efficacy 



Hodgcs-Lehmann asymptotic efficiency (HLAE) of p n (r) (see e.g. iHodges and Lehmann (fl956h under 7J e s 
is given by 

HLAE s (r, e) := (»s(r,e) - p(r)f _ 
vs{r, e) 

HLAE for association is defined similarly. Unlike PAE, HLAE does not involve the limit as e — ► 0. Since 
this requires the mean and, especially, the asymptotic variance of p n {r) under an alternative, we investigate 
HLAE for specific values of e. Figure [T0l contains a graph of HLAE against segregation as a function of r for 
e = \/3/8, Vo/4, 2v3/7. See Appendix 3 for explicit forms of ps{r, e) and us{r, e) for e = s/3/4. 

From Figure fTOl we see that, against Hf , HLAE s (r, e) appears to be an increasing function, dependent 
on e, of r. Let fd(e) be the minimum r such that p n (r) becomes degenerate under the alternative . Then 




Figure 9: Pitman asymptotic efficiency against segregation (left) and association (right) as a function of r. 
Notice that vertical axes are differently scaled. 
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Figure 10: Hodges-Lehmann asymptotic efficiency against segregation alternative Hf as a function of r for 
e = a/3/8, a/3/4, 2^3/7 (left to right). 



r d (V3/8) = 4, r d (a/3/4) = 2, and r d (2V3/7) = 2. In fact, for e G (0, a/3/4], r d (e) = \/3/(2e) and for 
e G (a/3/4, a/3/3), r d (e) = \/3/e - 2. Notice that lirn r ^ rd(e) HLAE S (r, e) = 00, which is in agreement with 
PAE S as e -> 0; since as e — > 0, HLAE becomes PAE and ^(e) — > 00 and under i?o, Pn( r ) is degenerate for 
r = 00. So HLAE suggests choosing r large against segregation, but in fact choosing r too large will reduce 
power since r > r^e) guarantees the complete digraph under the alternative and, as r increases therefrom, 
provides an ever greater probability of seeing the complete digraph under the null. 

Figure fTTI contains a graph of HLAE against association as a function of r for e = 5 a/3/ 24, x/3/12, a/3/21. 
See Appendix 3 for explicit forms of /^(r, e) and ^(r, e) for e = a/3/12. Notice that since v(r, e) = for 
e > a/3/12, HLAE A (r = 1, e) = 00 for e > \/3/12 and linv^oo HLAE A (r, e) = 0. 

In FigurefTTI we see that, against Hf, HLAE j4 (r, e) has a local supremum for r sufficiently larger than 1. 
Let f be the value at which this local supremum is attained. Then f (5 V3/24) w 3.2323, r(V3/12) w 1.5676, 
and f (a/3/21) ss 1.533. Note that, as e gets smaller, f gets smaller. Furthermore, HLAE-V = 1, a/3/21) < 00 
and as e — > 0, f becomes the global supremum, and PAE A (r = 1) = and argsup r>1 PAE A (r = 1) « 1.006. 
So, when testing against association, HLAE suggests choosing moderate r, whereas PAE suggests choosing 
small r. 



Figure 11: Hodgcs-Lchmann asymptotic efficiency against association alternative as a function of r for 
e = v/5/21, V3/12, 5^3/24 (left to right). 



4.5 Asymptotic Power Function Analysis 



The asymptotic power function (see e.g. iKendall and Stuartl ((19790) can a l so be investigated as a function 
of r, n, and e using the asymptotic critical value and an appeal to normality. Under a specific segregation 
alternative Hf , the asymptotic power function is given by 



n s (r,n,e) = 1 - $ 



VKO + Vn (MO - i"s(r-, e)) 



where Zi_ a = $ *(1 — a). Under Hf, we have 



n A (r,n,e) = $ 



\/ Krj + Vn (^(r) - ^(r, e)) 
y/vjJr, e) 



Analysis of Figure [T21 shows that, against H^^, a large choice of r is warranted for n = 100 but, 
for smaller sample size, a more moderate r is recommended. Against 2 , a moderate choice of r is 

recommended for both n = 10 and n = 100. This is in agreement with Monte Carlo investigations. 




Figure 12: Asymptotic power function against segregation alternative as a function of r for n = 10 

(first from left) and n = 100 (second) and association alternative -ffyg / 12 as a function of r for n = 10 (third) 
and n = 100 (fourth). 



5 Multiple Triangle Case 



Suppose y is a finite collection of points in M 2 with \y\ > 3. Consider the Dclaunay triangulation (assumed 
to exist) of y, where Tj denotes the j th Dclaunay triangle, J denotes the number of triangles, and ChO?) 



denotes the convex hull of y. We wish to test Ha : Xi 
alternatives. 



M(Cii(y)) against segregation and association 



The digraph D is constructed using Ny.(-) as described in Section 12731 where for Xi <G Tj the three points 
in y defining the Dclaunay triangle Tj arc used as y^. Let p n (r, J) be the relative density of the digraph 
based on X n and y which yields J Dclaunay triangles, and let Wj := A(Tj)/A(Cn(y)) for j = 1, . . . , J, 
where A(Cii{y)) = Sj=i^(^i) with A(-) being the area functional. Then we obtain the following as a 
corollary to Theorem 2. 

Corollary 1: The asymptotic null distribution for p n (r, J) conditional on W = {wi, . . . ,wj} for r € 
[1, oo ] is given by J\f(p(r, J), i/(r, J)/n) provided that v(r, J) > with 



p(r,J) := p{r) 



and v(r, J) := v(r) 



Mr) 2 



J 

3=1 




(12) 



where p(r) and u(r) are given by equations © and respectively. 
Proof: See Appendix 4. ■ 

By an appropriate application of Jensen's Inequality, we see that J2j=i w j — (X)/=i w j) • Therefore, 
v(r, J) = iff v(r) = and J2j=i w j = {J2j=i w j) ; so asymptotic normality may hold even when v{r) = 0. 

Similarly, for the segregation (association) alternatives with 4e 2 /3 x 100% of the triangles around the 
vertices of each triangle is forbidden (allowed), we obtain the above asymptotic distribution of p n (r) with 
p(r) being replaced by ps(r,e), u(r) by v s (r,e), p(r,J), by fi s (r,J,e), and v(r,J) by u s (r,J,e). Likewise 
for association. 

Thus in the case of J > 1, we have a (conditional) test of Hq : Xi *~ U(Cii(y)) which once again rejects 
against segregation for large values of p n {r, J) and rejects against association for small values of /0„(r, J). 

Depicted in Figure [TT3l are the segregation (with 5 = 1/16 i.e. e = y/3/8), null, and association (with 
5 = 1/4 i.e. e = VB/12) realizations (from left to right) with n = 1000, |^| = 10, and J = 13. For the 
null realization, the p-value is greater than 0.1 for all r values and both alternatives. For the segregation 
realization, we obtain p < 0.0031 for 1 < r < 5 and p > 0.24 for r = 1 and r > 10. For the association 
realization, we obtain p < 0.0135 for 1 < r < 3, p = .14 for r = 1, and p > 0.25 for for r > 5. Note that this 
is only for one realization of X„ . 





0.2 0.4 



Figure 13: Realization of segregation (left), Ho (middle), and association (right) for \y\ = 10, J = 13, and 
n = 1000. 



Figure 14: Monte Carlo power using the asymptotic critical value against H s ~- , as a function of r, for 

V 3/8 

n = 100 (left), n = 200 (middle), and n = 500 (right) conditional on the realization of y in Figure H"3l The 
circles represent the empirical significance levels while triangles represent the empirical power values. 




Figure 15: Monte Carlo power using the asymptotic critical value against H^.^ as a function of r, for 

n = 100 (left), n = 200 (middle), and n — 500 (right) conditional on the realization of y in Figure IT51 The 
circles represent the empirical significance levels while triangles represent the empirical power values. 



We implement the above described Monte Carlo experiment 1000 times with n = 100, n = 200, and n = 
500 and find the empirical significance levels 3s (n, J) and ctA(n, J) and the empirical powers /3^(r, y/3/8, J) 
and Pni. r i Vo/12, J). These empirical estimates are presented in Table Q] and plotted in Figures [Ml and [T5l 
Notice that the empirical significance levels are all larger than .05 for both alternatives, so this test is liberal 
in rejecting Hq against both alternatives for the given realization of y and n values. The smallest empirical 
significance levels and highest empirical power estimates occur at moderate r values (r = 3/2, 2, 3) against 
segregation and at smaller r values (r = y/2, 3/2) against association. Based on this analysis, for the given 
realization of y, we suggest the use of moderate r values for segregation and slightly smaller for association. 
Notice also that as n increases, the empirical power estimates gets larger for both alternatives. 

The conditional test presented here is appropriate when the W are fixed, not random. An unconditional 
version requires the joint distribution of the number and relative size of Dela unay triangles when y is, for 
instance, a Poisson point pattern. Alas, this joint distribution is not available Qkabe et al. ( 2000[) . 



5.1 Related Test Statistics in Multiple Triangle Case 

For J > 1, we have derived the asymptotic distribution of p n (r, J) = \A\/(n (n—1)). Let Aj be the number of 
arcs, rij := \X n DTj\, and p nj (r) be the arc density for triangle Tj for j = 1, . . . , J. So "n (n-~i) P n i ( r ) = 



r 


1 


11/10 


6/5 


4/3 


V2 


3/2 


2 


3 


5 


10 


n = 100, N = 1000 


as(n, J) 


0.144 


0.141 


0.124 


0.101 


0.095 


0.087 


0.070 


0.075 


0.071 


0.072 


'0%(r, v/3/8,J) 


0.191 


0.383 


0.543 


0.668 


0.714 


0.742 


0.742 


0.625 


0.271 


0.124 


3a </) 


0.118 


0.111 


0.089 


0.081 


0.065 


0.062 


0.067 


0.064 


0.068 


0.071 


flftr, V3/12,J) 


0.231 


0.295 


0.356 


0.338 


0.269 


0.209 


0.148 


0.095 


0.113 


0.167 


n = 200, AT = 1000 


as(n, J) 


0.095 


0.092 


0.087 


0.077 


0.073 


0.076 


0.072 


0.071 


0.074 


0.073 


'0%(r, v/3/8,J) 


0.135 


0.479 


0.743 


0.886 


0.927 


0.944 


0.959 


0.884 


0.335 


0.105 


3a </) 


0.071 


0.071 


0.062 


0.057 


0.055 


0.047 


0.038 


0.035 


0.036 


0.040 




0.182 


0.317 


0.610 


0.886 


0.952 


0.985 


0.972 


0.386 


0.143 


0.068 


n = 500, A = 1000 


3s(n, J) 


0.089 


0.092 


0.087 


0.086 


0.080 


0.078 


0.079 


0.079 


0.076 


0.081 


/^(r, a/3/8, J) 


0.145 


0.810 


0.981 


0.997 


0.999 


1.000 


1.000 


1.000 


0.604 


0.130 


3a (n, J) 


0.087 


0.085 


0.076 


0.075 


0.073 


0.075 


0.072 


0.067 


0.066 


0.061 


/^(r, 73/12, J) 


0.241 


0.522 


0.937 


1.000 


1.000 


1.000 


1.000 


0.712 


0.187 


0.063 



Table 1: The empirical significance level and empirical power values under H^. s and H^g/ 12 i N = 1000, 
n = 100, and J = 13, at a = .05 for the realization of y in Figure [T31 



p n (r, J), since £/ =1 % ["l^ p nj (r) = = = Pn(r, J)- 

Let U„ := ^2j = iWj p nj (r) where Wj = A(Tj) / A(C h (y)) ■ Since p nj (r) are asymptotically independent, 
\/n(U n — p(r, J)) and y/n(p n (r, J) — p,(r, J)) both converge in distribution to Af(0, v(r, J)). 

In the denominator of p n (r,J), we use n(n — 1) as the maximum number of arcs possible. However, 
by definition, we can at most have a digraph with J complete symmetric components of order rij, for 
j = 1,...,J. Then the maximum number possible is n t := X)j=i n j ( n j ~ !)• Then the (adjusted) arc 

density is p a n % := J£. Then p$(r) = = £/ =1 Pn t {r). Since > for each j, 

and ^2j = i " 3 ^" J t ~^ = 1; Pn d j( r ) * s a mixture of /9„^(r)'s. Then p^jij) is asymptotically normal with mean 
E [p^ j(r)] = /i(r, J) and the variance of p a n ^j(r) is 

ko(x>?/(E^ 2 ) 2 ) +4Mo 2 (x>,V(E-f) 2 - 1 ) ■ 

\j=l j=l / \.7 = 1 J = l ) 



1 

n 



5.2 Asymptotic Efficacy Analysis for J > 1 

The PAE, HLAE, and asymptotic power function analysis are given for J = 1 in Sections 14. 3[ 14.41 and !4.5[ 
respectively. For J > 1, the analysis will depend on both the number of triangles as well as the size of the 
triangles. So the optimal r values with respect to these efficiency criteria for J = 1 do not necessarily hold 
for J > 1, hence the analyses need to be updated, given the values of J and W. 

Under segregation alternative Hf , the PAE of p„ (r, J) is given by 

(/4(^ = Q) EU w tf 

™ JV ' ~ v(rj) u{r) E J =i w s + Wr , e = 0)2 ( E J =i W 3 _ (E J =i ^ ' 

Under association alternative Hf the PAE of p n (r 7 J) is similar. In Figure [TBI we present the PAE as a 



(^(r,J,6 = 0)) ; 




Figure 17: Hodges-Lehmann asymptotic efficiency against segregation alternative Hf as a function of r for 
e = v/3/8, v/3/4, 2 V3/7 (left to right) and J = 13. 



function of r for both segregation and association conditional on the realization of y in FigurcfTB"! Notice that, 
unlike J = 1 case, PAE j(r) is bounded. Some values of note are PAEf(pn(l)) = .3884, linv^oo PAE?(r) = 

=3 139.34, argsup re[12 , PAE^(r) « 1.974. As for association, PAE?(r = 1) = 



256 (e/=i«|-(E/=i' 

422.9551, linv-ooPAE^r) = 0, argsup r > x PAE^(r) = 1.5 with PAE^(r = 1.5) w 1855.9672. Based on the 
asymptotic efficiency analysis, we suggest, for large n and small e, choosing moderate r for testing against 
segregation and association. 

Under segregation, the HLAE of p n (r, J) is given by 
HLAE^r, e) 

Notice that RLAEj(r, e = 0) = and lim^oo HLAE5(r, e) = and HLAE is bounded provided that v(r, J) > 
0. 

We calculate HLAE of p n (r, J) under ijf for e = v/3/8, e = v/3/4, and e = 2^3/7. In Figure fTTl we 
present HL AE j (r, e) for these e values conditional on the realization of y in Figure [T5] Note that with 
e = v/3/8, HLAEf(r = 1, v/3/8) « .0004 and argsup rS[loo] HLAE^r, v/3/8) w 1.8928 with the supremum « 
.0544. With e = v/3/4, HLAE j(r = 1, v/3/4) w .0450 and argsup re[loo] HLAE5(r, v/3/4) w 1.3746 with the 
supremum « .6416. With e = 2 v/3/7, HLAE^r = 1,2 v/3/7) « .045 and argsup rS[l oo] HLAE5(r, 2 v/3/7) « 
1.3288 with the supremum « .9844. Furthermore, we observe that HLAEj(r, 2 v/3/7) > HLAEj(r, v/3/4) > 



( Ms (r,J,e)- M (r, J)) 2 


[Vs(r, e) (E; 7 =i^ 


)-M(r) (E/= 




vs{r,J,e) 


"sO", e) Ej J =i w f + 4 Ms(7, 


'-(^ :< 


-(E/ =1 ^) 2 ) 



HLAEj/r, a/3/8). Based on the HLAE analysis for the given y we suggest moderate r values for moderate 
segregation and small r values for severe segregation. 

The explicit form of HLAEj (r, e) is similar to HLAE j/r, e) which implies HLAEj (r, e = 0) = and 
lim^oo HLAE j (r, e) = 0. 

We calculate HLAE of p n (r, J) under for e = a/3/21, e = a/3/12, and e = 5 a/3/24. In Figure HU we 
present HLAEj (r, e) for these e values conditional on the realization of y in Figure [TBI Note that with e = 




2 3 4 5 °^ 1 2 3 4 5 °^ 1 2 3 4 5 

r r r 



Figure 18: Hodges-Lehmann asymptotic efficiency against association alternative Hf as a function of r for 
e = n/3/21, a/3/12, 5 V3/24 (left to right) and J = 13. 

a/3/21, HLAEj (r = 1, a/3/21) « .0009 and argsup re[l oo] HLAE^(r, a/3/21) « 1.5734 with the suprcmum 
w .0157. With e = a/3/12, HLAEj (r = 1, V3/12) w .0168 and argsup re [ lj0O ] HLAEj(r, \/3/12) w 1.6732 
with the supremum « .1818. With e = 5 V3/24, HLAEj (r = 1, 5 V3/24) « .0017 and 

argsup rg r 1 HLAEj (r, 5 a/3/24) ps 3.2396 with the suprcmum sa 5.7616. Furthermore, we observe that 

HLAEj (r, 5 a/3/24) > HLAEj (r, a/3/12) > HLAEj (r, a/3/21). Based on the HLAE analysis for the given 
y we suggest moderate r values for moderate association and large r values for severe association. 



6 Discussion and Conclusions 



In this article we investigate the mathematical properties of a random digraph method for the analysis of 
spatial point patterns. 

The first proximity map similar to the r-factor proximity map Ny in literature is the spherical proximity 
map Ns(x) := B(x,r(x)), (see the references for CCCD in the Introduction). A slight variation of N$ is the 
arc-slice proximity map Na s(%) '■= B(x,r{x)) n T(x) where T(x) is the Delaunay cell that contains x (see 
Cevhan and Priebel (l2003af0. Furthermo re, Ceyhan a nd Priebe introduced the ce ntral similarity proximity 
map Ncs in ICevhan and Prie bl (|2003ah and Ny in ICevhan and Priebel (|2003bl) . The r-factor proximity 
map, when compared to the othe rs, has the advantag e s that the asymptotic distribution of the domination 
number 7„(A£) is tractable (see Cevhan and Priebe ( 2003bf )). an exact minimum dominating set can be 
found in polynomial time. Moreover Ny and Ncs are geometry invariant for uniform data over triangles. 
Additionally, the mean and variance of relative density p n is not analytically tractable for N$ and Nas- 
While Ny(x), Ncs(x), and Nas{x) arc well defined only for x € Cn(y), the convex hull of y, N$(x) is well 
defined for all x £ M d . The proximity maps N$ and Nas require no effort to extend to higher dimensions. 



The Ns (the proximity map associated with CCCD) is used in classification in the literature, but not 
for testing spatial patterns between two or more classes. We develop a technique to test the patterns 
of segregation or ass o ciatio n. There are many tests available for segregation and association in ecology 
literature. See lDixonl (|1994l ) for a survey on these tests and relevant references. Two of the most commonly 
used tests are Pielou's x 2 test of independence and Ripley's test based on K(t) and L(t) functions. However, 
the test we introduce here is not comparable to either of them. Our test is a conditional test — conditional on 



a realization of J (number of Delaunay triangles) and W (the set of relative areas of the Delaunay triangles) 
and we require the number of triangles J is fixed and relatively small compared to n = \X n \. Furthermore, 
our method deals with a slightly different type of data than most methods to examine spatial patterns. The 
sample size for one type of point (type X points) is much larger compared to the the other (type y points). 
This implies that in practice, y could be stationary or have much longer life span than members of X . For 
example, a special type of fungi might constitute X points, while the tree species around which the fungi 
grow might be viewed as the y points. 



There are two major types of asymptotic structures for spatial data Lahiri ( 19961 ). In the first, any two 



observations are required to be at least a fixed distance apart, hence as the number of observations increase, 
the region on which the process is observed eventually becomes unbounded. This type of sampling structure 
is called "increasing domain asymptotics" . In the second type, the region of interest is a fixed bounded 
region and more or more points are observed in this region. Hence the minimum distance between data 
points tends t o zero as the sa mple size tends to infinity. This type of structure is called "infill asymptotics" , 
due to Cressie Cressid ( 199ll ). The sampling structure for our asymptotic analysis is infill, as only the size 



of the type X process tends to infinity, while the support, the convex hull of a given set of points from type 
Y process, Cn(y) is a fixed bounded region. 

Moreover, our statistic that can be written as a [/-statistic based on the locations of type X points with 
respect to type Y points. This is one advantage of the proposed method: most statistics for spatial patterns 
can not be written as [/-statistics. The [/-statistic form avails us the asymptotic normality, once the mean 
and variance is obtained by tedious detailed geometric calculations. 

The null hypothesis we consider is considerably more restrictive than current approaches, which can be 
used much more gener ally. The null hypothesis for testing segregation or association can be described in two 
slightly different forms Dixon ( 1994 ): 



(i) complete spatial randomness, that is, each class is distributed randomly throughout the area of interest. 
It describes both the arrangement of the locations and the association between classes. 

(ii) random labeling of locations, which is less restrictive than spatial randomness, in the sense that ar- 
rangement of the locations can either be random or non-random. 

Our conditional test is closer to the former in this regard. Pielou's test provide insight only on the association 
between classes, hence there is no assumption on the allocation of the observations, which makes it more 
appropriate for testing the null hypothesis of random labeling. Ripley's test can be used for both types of 
null hypotheses, in particular, it can be used to test a type of spatial randomness against another type of 
spatial randomness. 



The test based on the mean domination number in lCevhan and Priebel (j2003bh is not a conditional test, 



but requires both n and number of Delaunay triangles J to be large. The comparison for a large but fixed J 
is possible. Furthermore, under segregation alternatives, the Pitman asymptotic efficiency is not applicable 
to the mean domination number case, however, for large n and J we suggest the use of it over arc density 
since for each e > 0, Hodges-Lchmann asymptotic efficiency is unbounded for the mean domination number 
case, while it is bounded for arc density case with J > 1. As for the association alternative, HLAE suggests 
moderate r values which has finite Hodges-Lehmann asymptotic efficiency. So again, for large J and n mean 
domination number is preferable. The basic advantage of p n (r) is that, it does not require J to be large, so 
for small J it is preferable. 

Although the statistical analysis and the mathematical properties related to the r-factor proximity catch 
digraph are done in R 2 , t he extension to K d with d > 2 is straightforward. See Ceyhan and Priebe 
Cevhan and Priebel (|2f)f)3bh for more detail on the construction of the associated proximity region in higher 



dimensions. Moreover, the geometry invariance, asymptotic normality of the [/-statistic and consistency of 
the tests hold for d > 2. 
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Appendix 1: Derivation of fi(r) and v{r) 

In the standard equilateral triangle, let yi = (0,0), y 2 = (1,0), y3 = (1/2,-^/3/2), Mc be the center of 
mass, Mj be the midpoints of the edges ej for j = 1,2,3. Then M c = (1/2,^3/6), M x = (3/4,\/3/4), 
M 2 = (1/4,^3/4), M 3 = (1/2,0). 

Recall that E [p n {r)\ = £ Ei<j E = ±E [fc ia ] = /i(r) = P(X, G 7VJ(X,)). 

Let be a random sample of size n homU(T(y)). For zi = (u,v), e r { Xl ) = r v + r y/3 u - y/3 x. Next, 
let Ni := i r {xi) n e 3 and N 2 := £ r (xi) n e 2 . Then for zi G T s := T(yi, M 3 , M c ), iVy(^i) = T(yi,Ni,N 2 ) 
provided that £ r (xi) is not outside of T(J^), where 

#1 = (r (ift + v / 3xi)v / 3/3,0) and iV 2 = (r(yi + v^m) V3/6, ( yi + V3xi)r/2). 

Now we find n(r) for r G [l,oo). 

First, observe that, by symmetry, 

H(r) = P(X 2 G iVJ(Xr)) = 6P(X 2 G ^(Xx),^! G T s ). 

Let 4(r,x) be the line such that rd(yx,£ s (r, x)) = d(yi,ei) and ^ s (r,x) C\T(y) ^ 0, so & s {r,x) = \/3(f -a:). 
Then if x 1 G T s is above 4(r,x) then iVJ(xi) = T(^), otherwise, iVJ(xi) = r r (x x ) C T(^). 

For r G [1, 3/2), t s {r, x)C]T s = 0, so N$( Xl ) = T r ( Xl ) C T(^) for all ir G T s . Then 

P(X 2 G A^XA^ G T s ) = [ 1/2 [*' * A [ N ^)) dydx = —r 2 . 
V yK 1)1 J J A(T(y)) 2 y 1296 

where A(N^( Xl )) = ±§r 2 (y + V3x) 2 and A(T(y)) = v/3/4. Hence for r £ [1,3/2), /i(r) = §^r 2 . 

For r G [3/2,2), £ s (r,x) crosses through M S M C . Let the x coordinate of £ s (r,x) n y\M c be si, then 
Si = 3/(4 r). See Figure [T9l for the relative position of £ s (r,x) and T s . 



Then 

,1/2 r x/V3 

P(X 2 eN$(X 1 ),X 1 eT s ) = / / 

Jo Jo 



^^olyolx 



lo Jo A(T(y)) 2 

y y A(T(y))^ 2/da;+ A 1 y ^(T(^))^^ + y si /^^^(y))^ 

-36 + r 4 + 64r-32r 2 
48^2 ' 
Hence for r G [3/2,2), /i(r) = - s -^ 1 ^ ? -" 2 



For r G [2, oo), £ s (r, x) crosses through yiM 3 . Let the x coordinate of £ s (r, x)P\yiAI 3 be s 2 , then s 2 = 1/r 
See Figure [HI 



Then 



/ 7 ^C^CV)) 2 7 Si i A(T(y)) 2 ' 



S2 1 , , f 1/2 r x/V * 1 , , 1 -3 + 2r 2 

ay ax + / — — - — —ay ax 



e s (r,x) A(T(y)) J S2 J A(T(y)) 




Hence for r G [2, oo), /i(r) = 1 — | r~ 2 . 
For r = co, /i(r) = 1 follows trivially. 
To find Cov [/ii2j ^13], we introduce a related concept. 

Definition: Let be a measurable space and consider the proximity map N : £1 x p(f2) — » p(O), 

where p(-) represents the power set functional. For B C O, the Ti-region, Tx(-) = Ti(-,N) : fl — > p(f2) 
associates the region ri(B) :={ze!l:fiC -^X 2 )} with each set B C fi. For a; € 0, we denote ri({x}) as 
ri(x). Note that Ti-region depends on proximity region N(-). 

Furthermore, let Ti(-,Ny) be the Ti-region associated with Ny(-), let Aij be the event that {XiXj G 
A} = {X t G iVjOX,-)}, thcn = H^ij) + HAji)- Let 

P£ N := P({X 2 ,X 3 } C JVJ(Xi)), P£r := P(X 2 e JVJ^.Xg € I^, JVJ), PJ G := P({X 2 ,X 3 } c T 1 (X 1 ,N y )). 

Then Cov [/ii 2 , h\ 3 ] = E [/112 /113] — E [/ii 2 ]E [hi 3 ] where 

E [hi2 his] = E[(I(A 12 ) + I(A 21 ))(I(A 13 ) + I(A 31 )] 

= P(A 12 n A 13 ) + P(A U n A 31 ) + P(A 21 n A 13 ) + P{A 21 n A 31 ). 

= P({X 2 ,X 3 } C N y (X 1 )) + 2P(X 2 G N y {X 1 ),X 3 G Ti(Xi, Ny)) + P({X 2 ,X 3 } C I^Xx,^)) 

= °2;v + 2 + " 2 g- 

So v{r) = Cov [^12,^3] = {P r 2N + 2P r M + PJ G ) - [2^(r)] 2 . 

Furthermore, for any xi = [u, v) € 7(3^), Ti(xi, Ny) is a convex or nonconvex polygon. Let £j(r,x) be 
the line between X\ and the vertex y-,- parallel to the edge ej such that r d(yj, (r, a;)) = d{y 3 ;, £ r (xi)) for j = 
1, 2, 3. Then ri(a;i, 7V£) n R(yj) is bounded by £j(r, x) and the median lines. 

For x\ — (u, v). £i(r, x) = ~ y/3 x+ (v + \/~3u)/r, £ 2 (r, x) — (v + y/3r (x— 1) + y/3(l — u))/r and £ 3 (r, x) = 
(V3(r-l)+2«)/(2r). 

To find the covariance, we need to find the possible types of T\(xi, Ny) and Ny(xi) for r € [1, 00). First 
we find the possible intersection points of £y(x) with d(T(y)) and d(R(yj)) for j = 1, 2, 3. Let 

Gi = £i(r,x)ne 3 , G 2 = £ 2 (r,x)ne 3 , G 3 = £ 2 (r, x)ne 1; G 4 = £ 3 (r, x)Hei, G 5 = £ 3 (r, x)ne 2 , G 6 = £i(r, x)He 2 . 




Figure 20: The prototypes of the six cases for T 1 (x 1 ,N^) for x G T(yi, M 3 , M C c) for r G [4/3,3/2). 
Then, for example, G 5 = ( (v ^ r ~^ +2y)v ^ , ^-V3+2 ?y ^ Furthermore, let 

L x = gi( r,g)n MiMc, L 2 = g 2 ( r,a;) n MiM c , L 3 = 6(r,i)n^M Cl L 4 = 6(r,i)nM 2 M c , L 5 = 
6(r,x)nM 3 Af c , L 6 =a(r,a;)nM 3 M c . 

Then for example L 5 = f- (v ^ r ~ 3 ^ +6y)VE , v ^ r ~^f +2?y ). Then ^(si, JVJ) is a polygon whose vertices 
are a subset of the yj, Mc, Mj, j = 1, 2, 3 and Gj, Lj, j = 1, . . . , 6. 

See Figure H5] for the prototypes of ^(aii, JVJ) with r G [4/3, 3/2). 

We partition [l,oo) with respect to the types of iVJ(xi) and Ti(a:i, JVJ) into [1,4/3), [4/3,3/2), [3/2,2), [2,oo). 
For demonstrative purposes we pick the interval [4/3,3/2). For r £ [|, |), there are six cases regard- 
ing Ti(xi, Ny) and one case for Ny(x\). Each case j corresponds to the region Rj in Figure |2"T1 where 
si = 1 - 2 r/3, s 2 = 3/2 - r, s 3 = 1 - r/2, s 4 = 3/2 - 5 r/6, s 5 = 3/2 - 3 r/4. 

Let ^(ai, a 2 , . . . , a„) denote the polygon with vertices oi, a 2 , . . . , a n , then, for x\ = (x, y) G Rj, j = 
1, . . . , 6, T x {x u N§) are 5»(Gi,Mi, M c , M 3 , G 6 ), ^(G x , M 1; L 2 , L 3 , M c , M 3 , G 6 ), ^(G 1; G 2 , G 3 , M 2 , M c , M 3 , G 6 ), 
^(d, Mi, La, L 3 , L 4 , L 5 , M 3 , G 6 ), ^(G 1; G 2 , G 3 , M 2 , L 4 , L 5 , M 3 , G 6 ) and ^(d, G 2 , G 3 , G 4 , G 5 , G 6 ), respec- 
tively. 

The explicit forms of Rj, j = 1, . . . , 6 are as follows: 



Figure 21: The regions corresponding to the six cases for r G [4/3, 3/2) 



R\ = {(x,y) G [0,si] x [0,4 m (x)] U [si,s 2 ] x [gi(x),4 m (a;)]} 

#2 = {(x,y) G [si,s 2 ] x [0,qi(x)} U [s 2 ,s 3 ] x [0,q 2 (x)] U [s 3 ,s 4 ] x [g 3 fa), 32(2:)]} 

R3 = {(x,y) G [s 3 ,s 4 ] x [0,g 3 (z)]U[s4,l/2] x [0, ©(*)]} 

•R4 = {(x,y) G [s 2 ,s 4 ] x [q 2 {x),£ am {x)} U [s 4 , s 6 ] x [g 3 (x) , 4m (a;)] } 

#5 = {(x,y) G [S4,s 6 ] x [q 2 (x),q s (x)] U [s 6 , 1/2] x [q 2 (x),q 4 (x)]} 

R 6 = {(x,y) G [s 6 ,l/2] x [g 4 (a;), 4™ (x)]}, 

where 4mW = z/v^, gi(x) = (2r - 3)/V3 + V3x, q 2 (x) = V3(l/2 - r/3), g 3 (x) = V3(x- l + r/2), and 
? 4 (x) = V3(l/2-r/4). 

Then P({X 2 , X 3 } C iVJ(Xi)) = j^qT 4 . (We use the same limits of integration in /j,(r) calculations 
with the integrand being A(N y (xi)) 2 / 'A(T(y)f . 

Next, by symmetry, P({X 2 ,X 3 } C ri^.AJ)) = 6P({X 2 ,X 3 } C r^iVJ), Xj G T(y,M 3 ,M c )). 
Then 

6 

P({X 2 , X 3 } c ri(X x ,iVJ), X! G T(y, M 3! M c )) = ]T p ({*2, x 3 } c r^, flj), X 1 G i?,). 

3=1 

For example, for x\ G i?4, 

r\\2 



Si r tam ^ A(T 1 (x ll Ny)) 



p({x 2 ,x 3 } c rxp^ivj), x x g i? 4 ) = / / iT^vtF ^ 

s 6 ft am (x) A(ri(a;i,iVJ)) 2 ; ^ 9637 r 4 - 89640 r 3 + 288360 r 2 - 362880 r+ 155520 



, S4 ,, 3(x) A(T(^)) 3 ~ dydX 349920 r 2 

where A(ri(cci iVJ)) = r2+18 ~ 24 r + 4 ^ ^~ 18 1+6 a 2 2+14 ^ 2 + 12 r x ~ 8 x ^v~ 6 ^v) 
Similarly, we calculate for j = 1,2,3,5,6 and get 

' -47880 r 5 - 38880 r 2 + 25687 r 6 - 1080 r 4 + 60480 r 3 + 3888' 



P({X 2 ,X 3 }cTi{X u N y )) 



y 349920 r 4 

-47880 r 5 - 38880 r 2 + 25687 r 6 - 1080 r 4 + 60480 r 3 + 3888 

58320 r 4 ' 



Furthermore, P(X 2 G N$(Xi), X 3 G T^X^N^), X 1 G T(y,M 3> M c )) = T.] =1 P(X 2 E N^X,), X 3 G 

ri(Xi,JV5), x 1 eR j ). 

For example, x t E R 4 , we get -j^^ r 2 (207360 + 404640 r 2 - 483840 r - 142920 r 3 + 17687 r 4 ) by using 
the same integration limits as above, with the integrand being A{Ny(x\j) A(Ti(xi, Ny)) / A(T (y)) 3 . 

Similarly, we calculate for j = 1,2,3,5,6 and get 

P(X a SNZ(X 1 ),X a eTi(X 1 ,NZ)) = &( 5467 r 6 - —r 5 + —r 4 - —r 2 + 
v yy h n ' y " \^ 2799360 2592 1296 648 12960 J 

5467 e _ _35_ 5 37 4 _ _13_ 2 83 
_ 466560 T 432 r 216 T 108 r 2160 ' 

So, E [his his) = [5467 r 10 - 37800 r 9 + 89292 r 8 + 46588 r 6 - 191520 r 5 + 13608 r 4 + 241920 r 3 - 155520 r 2 + 
15552]/[233280 r% 

Thus, for r E [4/3,3/2), u(r) = [5467 r w - 37800 r 9 + 61912 r 8 +46588 r 6 - 191520 r 5 + 13608 r 4 + 241920 r 3 - 
155520r 2 + 15552]/[233280 r 4 ]. 



Appendix 2: /i(r, e) for Segregation and Association Alternatives 

Derivation of n{r,e) involves detailed geometric calculations and partitioning of the space of (r,e,X\) for 
T E [1, oo), e G [0, n/3/3), and xi G T s . 

^(r, e) Under Segregation Alternatives 

Under segregation, we compute fj,s(r,e) explicitly. For e E [0, y/3/8), fj,s(r,e) = Sj=i e ) I( r G ^j) 

where 

_ 576 r V - 1152 e 4 - 37 r 2 + 288 e 2 
Ml ' ltr ' £)_ 216(2e + l) 2 (2e-l) 2 

Mi,2(r, e) = -[576 rV - 1152 r 2 e 4 + 91 r 4 + 512 \/3r 3 e + 2592 rV + 1536 ^3re 3 + 1152 e 4 

- 768 r 3 - 2304 ^r 2 e - 6912 re 2 - 2304 ^e 3 + 1728 r 2 + 3456 V3re + 5184 e 2 

- 1728r- 1728V3e + 648]/[216r 2 (2e+ l) 2 (2e- l) 2 ], 

^1,3 (r,e) = -[192 r 4 e 4 - 384 rV + 9r 4 + 864 r 2 e 2 + 512 V3re a + 384 e 4 - 2304 re 2 - 768 \/3e 3 

- 288 r 2 + 1728 2 + 576 r - 324]/[72 r 2 (2 e + 1) 2 (2 e - l) 2 ], 

^1,4 (r, e) = -[192 r 4 e 4 - 384 rV - 9 r 4 - 96 \/3r 3 e + 288 r 22 - 128 e 4 + 144 r 3 + 576 \/3V 2 e + 256 

V^e 3 - 720 r 2 - 1152 y/Sre - 576 2 + 1152 r + 768 VSe - 612]/[72 r 2 (2 e + 1) 2 (2 e - l) 2 ], 

48 r 4 e 4 - 96 r 2 e 4 + 72 r 2 e 2 - 32 e 4 + 64 ^3e 3 - 18 r 2 - 144 e 2 + 27 
Mi,5(r,e) - 18r 2 (2e + l) 2 (2e- l) 2 ' 

48 r 4 e 4 + 256 r 3 e 4 - 128 v / 3r 3 e 3 + 288 r 2 e 4 - 192 v / 3r 2 e 3 + 72 rV + 18 r 2 + 48 V3e - 45 
Ml,6(r ' e) ~ 18 (2e + l) 2 (2e-l) 2 r 2 ' 

AH,r(r-,e) = 1, 

with the corresponding intervals Ii = [1,3/2 - V3e), X 2 = [3/2 - V3e,3/2), 1 3 = [3/2,2 - 4e/V3), 
Z 4 = [2-4e/V3,2),X 5 = [2, ^3/(2 e) - 1), J 6 = [V3/(2e) - 1,^3/(2 e)), and J 7 = [V3/(2 e), 00). 

For e G [V3/8, V3/6), Hs{r,e) = J2]=i V2,j{r, e) I(r G I,) where fJ, 2 ,j(r,e) = Hij(r,e) for j = 1,2,4,5,6, 



and for j = 3, 7, 

M2,3(r, e) = -[576 rV - 1152 rV + 37 r 4 + 224 V3r 3 e + 864 rV - 384 e 4 - 336 r 3 - 576 ^3r 2 e 

+ 768 V3e 3 + 432 r 2 - 1728 e 2 + 576 V3e - 216]/[216 r 2 (2 e + 1) 2 (2 e - l) 2 ], 
/i2,7(r,e) = 1, 

with the corresponding intervals I x = [1,3/2- V3e), X 2 = [3/2 - \/3 e, 2 - 4 e/V3), I 3 = [2 - 4 e/V3, 3/2), 
li = [3/2,2), J 5 = [2,V3/(2e)- 1),X 6 = [^3/(2 e) - 1, V3/(2 e)), and J 5 = [^3/(2 e), oo). 

For e € [V3/6, \/3/4), /i s (r, e) = X^ =1 /"3,j(r, e) I(r € Ij) where /J 3 a( r ; e ) = Mi,2(>", e) and 

M3,2(r,e) = -[576r 4 e 4 - 1152 rV + 37 r 4 + 224 v^e + 864 rV - 384 e 4 - 336 r 3 - 576v / 3r 2 e 
+ 768 V3e 3 + 432 r 2 - 1728 e 2 + 576 V3e - 216]/[216 r 2 (2 e + 1) 2 (2 e - l) 2 ], 

M3,3(r, e) = [576 rV + 3072 re 4 - 1536 V3re 3 + 3456 e 4 - 2304 v^e 3 - 37 r 2 - 224 V3re 
+ 864 e 2 + 336 r + 576 V3e - 432]/[216 (2 e + 1) 2 (2 e - l) 2 ], 

M3,4(r,e) = [192r 4 e 4 + 1024rV - 512V3r 3 e 3 + 1152r 2 e 4 - 768 V3r 2 e 3 + 9 r 4 + 96 V3r 3 e + 288 r 22 

- 144 r 3 - 576 v^e + 720 r 2 + 1152 v^re - 1152 r - 576 v^e + 540]/[72 r 2 (2 e + 1) 2 (2 e - l) 2 ], 



with the corresponding intervals Xi = [1, 2-4e/V3), X 2 = [2-Ae/V3, v / 3/(2e)-l),T 3 = [V3/(2 e) - 1, 3/2), 
1 4 = [3/2,2), X 5 = [2, v / 3/(2e)), and 2s = [V3/(2 e), oo). 

For e € [\/3/4, V3/3), /i S (r, e) = 5Zf=i M4j(r, e) € Zj) where 



M4,2(r,e) = -[9r 4 e 4 - 4 v^A 3 + 48 r V - 48 v^e 3 - 90 rV + 36r 3 e 2 + 96 v / 3r 2 e 3 - 126 r 2 e 2 
- 32 V3re 3 - 48 e 4 + 36 v / 3r 2 e + 144 re 2 + 96 v^e 3 - 18 r 2 - 72 v^re - 216 2 + 36 r 
+ 72 v^e - 27]/[2 (3 e - \/3)V], 

M4,3(r, e) = 1, 

with the corresponding intervals X\ = [1,3 — 2 e/\/3), I2 = [3 — 2 e/v^, V^/e — 2), and I3 = [\/3/e — 2, 00). 



M3,6(r, e) 



M3,5(r, e) 




/"4,i(r-,e) 



9 rV + 2 y^e + 48 re 2 + r 2 - 16 VSre - 90 e 2 - 12 r + 36 Vie 
18 (3 e- v^) 2 



/x^(r, e) Under Association Alternatives 



Under association, we compute /^(r, e) explicitly. For e G [0, (7y3 — 3-\/l5)/12 ~ -042), /x^(r, e) 
Ej=i e ) I ( r € J j) where 

/ii,i(r,e) = -[3456 eV + 9216 eV - 3072 VSe 3 r 4 - 17280 eV - 3072 ^3e 3 r 3 + 2304 eV 
+ 4608 V3e 3 r 2 - 2304 eV + 6336 e 4 + 6144 ^3e 3 r + 6912 eV + 512 V3e r 3 

- 101 r 4 - 6144 V3e 3 - 11520 2 r - 1536 ^3er 2 + 256 r 3 + 5760 2 + 1536 V3er 

- 384 r 2 - 512 V3e + 256 r - 64]/ [24 (6 e + \/3) 2 (6 e - V3)V], 

^1,2 (r,e) = -[1728eV - 1536 V3e 3 r 4 - 31104 eV + 1152 eV + 15552 e 4 + 10368 eV - 37r 4 

- 20736 2 r + 10368 e 2 ]/[24 (6 e + V3) 2 (6 e - V3) V], 

Mi,3(r, e) = [-2592 eV - 2304 VSeV - 46656 eV + 1728 eV + 10656 e 4 - 9216 \/3e 3 r 

+ 9072 eV - 432 73er 3 - 15 r 4 + 12288 \/3e 3 - 13824 e 2 r + 1728 ^3er 2 - 216 r 3 



jUi, 5 (r,e) = 
Hi,e(r,e) = 



+ 4032 e 2 - 2304 V3e r + 432 r 2 + 1024 V^e - 384 r + 128]/[36 (6 e + \/3) 2 (6 e - Vifr 2 ] , 

1728e 4 r 4 - 1536 v^eV - 31104 eV + 1152e 2 r 4 - 5184e 4 + 2592 eV - 37 r 4 - 3456e 2 



24(6e + V3) 2 (6e- V3) 2 r 2 
9 (1152 eV + 192 e 4 - 192 eV - r 4 + 128 e 2 + 32 r 2 - 64r + 36) 
8(6e + \/3) 2 (6e- ^) 2 r 2 _ ' 
9(r + 6)(r-2) 3 



8(6e + V3) 2 (6e- ^3) 2 r 2 ' 



with the corresponding intervals T\ = 



1- 



1+2 V3£ 
1-V3e 



,2"2 = 



l+2\/3«; 4(l-y/3< 
1-V3e ' 3 



,2"3 = 



4(1— \Z3e 4(1+2^6 
3 ' 3 



4(l+2x/3e 3 

3 ' 2(1-V3e) 



,2s 



2(1- 



^,2j andX 6 = [2,oo). 



For e G [(7 V3 - 3 Vl5)/12, V3/12), fi A (r,e) = Ej=iP2,i(r,e)I(r G J,) where /x 2j (r,e) = ^-(r.e) for 
j = 1, 3, 4, 5, 6 and 

M2,2(r,e) = [-3456 eV + lllr 4 - 5184 eV + 4608 \/3eV - 336 VSe r 3 - 168 r 3 - 13824 eV 
+ 4608 v / 3e 3 r 3 + 3456 eV + 144 r 2 - 6912 V3e 3 r 2 - 3888 eV + 576 V3e r 2 
+ 25920 eV + 3168 e 4 + 2880 e 2 - 256 \/3e - 32 - 3072 V3e 3 ]/[36 {VS + 6 e) 2 (-6 e + V3) 2 r 2 ] 



with the corresponding intervals X\ 



4(l-73£) 
3 



,^2 



4(1-^3+) 1+2 V3, 
3 ' 1-V3e 



1+2 V3e 4(l+2V3e 
1-V3e ' 3 



1a = 



4(l+2x/3e 3 

3 ' 2(1- V3e) 



,X 5 = 



2) andX 6 = [2,oo). 



2(l-\/3e) : 

For e G [V3/12, V3/3), ^(r, e) = £ 3 =1 /i 3 j(r, e) I(r G X,-) where 
2r 2 -l 



M3,i(r, e) 



6 r 2 



/U3,2(r, e) = [432 e 4 r 4 + 1152 eV - 576 V3e 3 r 4 + 1296 e 4 r 2 - 960 V3e 3 r 3 + 864 e 2 r 4 - 864 V3e 3 r 2 



M3,3(r, e) 



576 e 2 r 3 - 192 V3e r 4 - 360 e 4 + 648 e 2 r 2 + 64 V3e r 3 + 4 
64 r 3 - 504 e 2 + 72 r 2 + 88 V3e - 25] / [16 (3 e - V3) 4 r 2 ] , 
-54 e 2 r 2 + 36 y/3e r 2 + 15 e 2 - 18 r 2 + 2 \/3e + 20 



- 4 + 192 V3e 3 - 144V3ei 



6(-3e + \/3) 2 r 2 



with the corresponding intervals T\ 



1 Wv^^ ^ J 
' 2(l-v / 3e) / /' 3 



1+2 73 e 3 
2(1-V3e)' 2(1-V3c)/' 



2 (1-V3«) ' 



Appendix 3: /i(r, e) and z/(r, e) for Segregation and Association Al- 
ternatives with Sample e values 



f 67 2 , 40 „ _ O f p M O /Q\ 

With e = V3/4, r € [1, 2), w (r, V5/4) = | 7 *tJ^ f _ 126r+i8 for J ^ 2 j 
v s (r, n/3/4) = £j =1 ^(r, V3/4) 1(2,) where 



^i(r,^3/4) = -[14285r 7 - 28224r 6 - 233266r 5 + 1106688r 4 - 2021199r 3 + 1876608r 2 

- 880794 r + 165888]/ [3645 r], 

u 2 (r,V3/4) = -[14285r 10 - 28224r 9 - 233266r 8 + 1106688r 7 - 1234767 r 6 - 3431808r 5 
+ 14049126 r 4 - 22228992 r 3 + 18895680 r 2 - 8503056 r + 1594323]/[3645 r 4 ] , 
»■) (r, a/3/4) = - [14285 r 10 - 28224 r 9 - 233266 r 8 + 1 106688 r 7 - 2545713 r 6 + 5903280 r 5 

- 13456044 r 4 + 20636208 r 3 - 18305190 r 2 + 8503056 r - 1594323]/ [3645 r 4 ] , 
t(r, \/3/4) = [104920 r 8 - 111072 r + 1992132 r 6 - 15844032 r 5 + 50174640 r 4 + 6377292 

- 34012224 r + 73220760 r 2 - 81881280 r 3 + 1909 r 10 - 27072 r 9 ]/[14580 r 4 ], 
i(r,V3/4) = -[-1187904r 5 + 1331492 r 6 + 433304 r 2 + 611163 r 10 - 850240r 9 - 198144 



V5 



where 



+ 955392 r 4 - 705536 r 3 - 387680 r 11 + 1118472 r 8 - 1308960 r + 175984 r 12 
- 46176 r 13 + 5120 r 14 + 56016]/[20 r 4 ], 

and the corresponding intervals are T\ = [1, §), 1 2 = [9/8,9/7), X 3 = [9/7,4/3), X 4 = [4/3,3/2), 1 5 = 
[3/2,2). 

C 6r 4 -16r 3 + 18r 2 -5 f p M q 1 ) 

Wither V3/12 >Mjl (r,V3/12)= ^ 37 Jf 7 ^ , ' \ and v A (r, y/E/12) = £. 3 =1 v 5 {r, V5/12) I(J 3 -) 

I— f§r +1 for re[2,oo) J 

(r,V3/12) = [10 r 12 - 96 r 11 + 240 r 10 + 192 r 9 - 1830 r 8 + 3360 r 7 - 2650 r 6 + 240 r 5 + 1383 r 4 

- 1280 r 3 + 540 r 2 - 144 r + 35]/[405 r 6 ], 
,(r, V3/12) = [10 r 12 - 96 r 11 + 240 r 10 + 192 r 9 - 1670 r 8 + 2784 r 7 - 2650 r 6 + 2400 r 5 - 1047 r 4 

- 1280r 3 + 1269r 2 - 144r + 35]/[405 r 6 ], 

, /77 /h \ 537 r 4 - 683 r 2 - 2448 r + 1315 
«*(r,V3/12) = ^ . 

The corresponding intervals are 2i = [1,3/2), X2 = [3/2,2), I3 = [2, 00). 

Appendix 4: Proof of Corollary 1 

In the multiple triangle case, 

H(r, J) = E [p„(r)] = 1 E E E fail = 2" E M = E t^ia)] - P(A 2 ) = P(X 2 G iVJ^)). 



But, by definition of Ny(-), P(X 2 G Ny(X\)) = if X i and X 2 are in different triangles. So by the law of 
total probability 

,/ 

fi(r,J) := P{X 2 €Ny(X 1 ))=Y,P{X2€Ny[(X 1 )\{X 1 ,X 2 }cT j )P({X 1 ,X 2 }cT j ) 
J 

= ^ f i(r)P({X 1 ,X 2 } cTj) (since P(X 2 eN y (X 1 )\{X 1 ,X 2 } cT j )= f M(r)) 

3=1 

= /i(r) j2(A{T 3 )/A{C H (y))f (since P{{X U X 2 } C 2}) = {A{T 3 ) / 'A{C H {y))f) 

3=1 

Letting Wj := A(Tj)/A(CH(y)), we get /j(r, J) = //(r) ■ (J2j=i w j) where £t(r) is given by equation ©. 
Furthermore, the asymptotic variance is 

v(r,J) = E[h 12 hi 3 ] -E[fti 2 ]E[/n 3 ] 

= p({x 2 ,x 3 } c Jvjpro) + 2P(x 2 e ^(x 1 ),x 3 g ri^.jvj)) 

+P({X 2 ,X 3 } c ^(X^iVJ)) -4(/i(r, J)) 2 . 

Then for J > 1, we have 

,/ 

P({X 2 ,X 3 } c iVj(XO) = ^Tp({X 2 ,X 3 } c N y (X 1 )\{X 1> X 2 ,X 3 } c TjjP^Xi.Xa.Xs} C T 3 ) 

J=l 

= J2 p 2N (A(T J )M(c ff (y))) 3 = p 2 v(E^ 3 )- 

7=1 7=1 

Similarly, P(X 2 G N$(X!),X 3 G r^iVJ)) = P^ (£/ =1 u> 3 ) and P({X 2 ,X 3 } C T^X^N^)) = PJ G (£)/ = 
hence, i/(r, J) = (P^ + 2P r M + Pfc) (E'U^) ~ 4 Mr, J)) 2 = i/(r) (£/ =1 tof) + Mr) 2 (e/ =1 - 
(£/=i "'I) 2 ) > 80 conditional on W, if v(r, J) > then (p n (r) - £t(r, J)) ■--> W(0, i/(r, J)). ■ 



